{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Topic Modeling: Latent Dirichlet Allocation with gensim"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Gensim is a specialized NLP library with a fast LDA implementation and many additional features. We will also use it in the next chapter on word vectors (see the notebook lda_with_gensim for details."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Imports & Settings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-11-17T22:51:16.888440Z",
     "start_time": "2018-11-17T22:51:16.589557Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/stefan/.pyenv/versions/miniconda3-latest/envs/ml4t/lib/python3.6/site-packages/scipy/sparse/sparsetools.py:21: DeprecationWarning: `scipy.sparse.sparsetools` is deprecated!\n",
      "scipy.sparse.sparsetools is a private module for scipy.sparse, and should not be used.\n",
      "  _deprecated()\n"
     ]
    }
   ],
   "source": [
    "import warnings\n",
    "from collections import OrderedDict\n",
    "from pathlib import Path\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "# Visualization\n",
    "from ipywidgets import interact, FloatSlider\n",
    "import matplotlib.pyplot as plt\n",
    "from matplotlib.ticker import FuncFormatter\n",
    "import seaborn as sns\n",
    "\n",
    "import pyLDAvis\n",
    "from pyLDAvis.sklearn import prepare\n",
    "\n",
    "from wordcloud import WordCloud\n",
    "from termcolor import colored\n",
    "\n",
    "# spacy for language processing\n",
    "import spacy\n",
    "\n",
    "# sklearn for feature extraction & modeling\n",
    "from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer, TfidfTransformer\n",
    "from sklearn.decomposition import LatentDirichletAllocation, TruncatedSVD, NMF\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.externals import joblib\n",
    "\n",
    "# gensim for alternative models\n",
    "from gensim.models import LdaModel, LdaMulticore\n",
    "from gensim.corpora import Dictionary\n",
    "from gensim.matutils import Sparse2Corpus"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T04:27:58.207682Z",
     "start_time": "2018-05-01T04:27:58.198244Z"
    }
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "plt.style.use('ggplot')\n",
    "plt.rcParams['figure.figsize'] = (14.0, 8.7)\n",
    "pyLDAvis.enable_notebook()\n",
    "warnings.filterwarnings('ignore')\n",
    "pd.options.display.float_format = '{:,.2f}'.format"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "## Load BBC data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# change to your data path if necessary\n",
    "DATA_DIR = Path('../data')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-11-30T16:00:39.606772Z",
     "start_time": "2018-11-30T16:00:39.503364Z"
    },
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "path = DATA_DIR / 'bbc'\n",
    "files = path.glob('**/*.txt')\n",
    "doc_list = []\n",
    "for i, file in enumerate(files):\n",
    "    with open(str(file), encoding='latin1') as f:\n",
    "        topic = file.parts[-2]\n",
    "        lines = f.readlines()\n",
    "        heading = lines[0].strip()\n",
    "        body = ' '.join([l.strip() for l in lines[1:]])\n",
    "        doc_list.append([topic.capitalize(), heading, body])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "### Convert to DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T04:27:59.007837Z",
     "start_time": "2018-05-01T04:27:58.992529Z"
    },
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 2225 entries, 0 to 2224\n",
      "Data columns (total 3 columns):\n",
      "topic      2225 non-null object\n",
      "heading    2225 non-null object\n",
      "article    2225 non-null object\n",
      "dtypes: object(3)\n",
      "memory usage: 52.2+ KB\n"
     ]
    }
   ],
   "source": [
    "docs = pd.DataFrame(doc_list, columns=['topic', 'heading', 'article'])\n",
    "docs.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Create Train & Test Sets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T04:38:58.366229Z",
     "start_time": "2018-05-01T04:38:58.356918Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "train_docs, test_docs = train_test_split(docs, \n",
    "                                         stratify=docs.topic, \n",
    "                                         test_size=50, \n",
    "                                         random_state=42)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T04:38:58.372958Z",
     "start_time": "2018-05-01T04:38:58.368455Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((2175, 3), (50, 3))"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_docs.shape, test_docs.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T04:38:58.381455Z",
     "start_time": "2018-05-01T04:38:58.374872Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Sport            12\n",
       "Business         11\n",
       "Tech              9\n",
       "Entertainment     9\n",
       "Politics          9\n",
       "Name: topic, dtype: int64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.Series(test_docs.topic).value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Vectorize train & test sets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T04:38:59.033549Z",
     "start_time": "2018-05-01T04:38:58.383604Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<2175x2000 sparse matrix of type '<class 'numpy.int64'>'\n",
       "\twith 178572 stored elements in Compressed Sparse Row format>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vectorizer = CountVectorizer(max_df=.2, \n",
    "                             min_df=3, \n",
    "                             stop_words='english', \n",
    "                             max_features=2000)\n",
    "\n",
    "train_dtm = vectorizer.fit_transform(train_docs.article)\n",
    "words = vectorizer.get_feature_names()\n",
    "train_dtm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T04:38:59.052875Z",
     "start_time": "2018-05-01T04:38:59.035152Z"
    },
    "scrolled": true,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<50x2000 sparse matrix of type '<class 'numpy.int64'>'\n",
       "\twith 4160 stored elements in Compressed Sparse Row format>"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test_dtm = vectorizer.transform(test_docs.article)\n",
    "test_dtm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## LDA with gensim"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "### Using `CountVectorizer` Input"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T04:50:23.337553Z",
     "start_time": "2018-05-01T04:50:23.017269Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "max_df = .2\n",
    "min_df = 3\n",
    "max_features = 2000\n",
    "\n",
    "# used by sklearn: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_extraction/stop_words.py\n",
    "stop_words = pd.read_csv('http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words', \n",
    "                         header=None, \n",
    "                         squeeze=True).tolist()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T04:50:24.064327Z",
     "start_time": "2018-05-01T04:50:23.340576Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "vectorizer = CountVectorizer(max_df=max_df, \n",
    "                             min_df=min_df, \n",
    "                             stop_words='english', \n",
    "                             max_features=max_features)\n",
    "\n",
    "train_dtm = vectorizer.fit_transform(train_docs.article)\n",
    "test_dtm = vectorizer.transform(test_docs.article)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Convert sklearn DTM to gensim data structures"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It faciltiates the conversion of DTM produced by sklearn to gensim data structures as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T04:50:24.070987Z",
     "start_time": "2018-05-01T04:50:24.066420Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "train_corpus = Sparse2Corpus(train_dtm, documents_columns=False)\n",
    "test_corpus = Sparse2Corpus(test_dtm, documents_columns=False)\n",
    "id2word = pd.Series(vectorizer.get_feature_names()).to_dict()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Train Model & Review Results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "LdaModel(corpus=train_corpus, \n",
    "         num_topics=100, \n",
    "         id2word=None, \n",
    "         distributed=False, \n",
    "         chunksize=2000,                   # Number of documents to be used in each training chunk.\n",
    "         passes=1,                         # Number of passes through the corpus during training\n",
    "         update_every=1,                   # Number of docs to be iterated through for each update\n",
    "         alpha='symmetric', \n",
    "         eta=None,                         # a-priori belief on word probability\n",
    "         decay=0.5,                        # percentage of previous lambda forgotten when new document is examined\n",
    "         offset=1.0,                       # controls slow down of the first steps the first few iterations.\n",
    "         eval_every=10,                    # estimate log perplexity\n",
    "         iterations=50,                    # Maximum number of iterations through the corpus\n",
    "         gamma_threshold=0.001,            # Minimum change in the value of the gamma parameters to continue iterating\n",
    "         minimum_probability=0.01,         # Topics with a probability lower than this threshold will be filtered out\n",
    "         random_state=None, \n",
    "         ns_conf=None, \n",
    "         minimum_phi_value=0.01,           # if `per_word_topics` is True, represents lower bound on term probabilities\n",
    "         per_word_topics=False,            #  If True, compute a list of most likely topics for each word with phi values multiplied by word count\n",
    "         callbacks=None);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_topics = 5\n",
    "topic_labels = ['Topic {}'.format(i) for i in range(1, num_topics+1)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T05:07:27.311042Z",
     "start_time": "2018-05-01T05:05:23.051642Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "lda_gensim = LdaModel(corpus=train_corpus,\n",
    "                      num_topics=num_topics,\n",
    "                      id2word=id2word)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T05:04:28.529642Z",
     "start_time": "2018-05-01T05:02:33.896Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0,\n",
       " '0.008*\"search\" + 0.006*\"net\" + 0.006*\"mail\" + 0.005*\"yahoo\" + 0.005*\"labour\" + 0.005*\"web\" + 0.005*\"tax\" + 0.004*\"says\" + 0.004*\"information\" + 0.004*\"oil\"')"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "topics = lda_gensim.print_topics()\n",
    "topics[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Evaluate Topic Coherence\n",
    "\n",
    "Topic Coherence measures whether the words in a topic tend to co-occur together. \n",
    "\n",
    "- It adds up a score for each distinct pair of top ranked words. \n",
    "- The score is the log of the probability that a document containing at least one instance of the higher-ranked word also contains at least one instance of the lower-ranked word.\n",
    "\n",
    "Large negative values indicate words that don't co-occur often; values closer to zero indicate that words tend to co-occur more often."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T05:04:28.530886Z",
     "start_time": "2018-05-01T05:02:34.504Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "coherence = lda_gensim.top_topics(corpus=train_corpus, coherence='u_mass')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Gensim permits topic coherence evaluation that produces the topic coherence and shows the most important words per topic: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T05:04:28.531995Z",
     "start_time": "2018-05-01T05:02:36.466Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  Topic 1         Topic 2           Topic 3           Topic 4         Topic 5            \n",
      "     prob    term    prob      term    prob      term    prob    term    prob        term\n",
      "0   0.70%   games   0.56%    united   0.97%    labour   0.78%  search   0.70%     digital\n",
      "1   0.55%    game   0.52%        eu   0.81%     blair   0.62%     net   0.69%        wage\n",
      "2   0.49%    2004   0.38%       aid   0.63%     party   0.59%    mail   0.60%     minimum\n",
      "3   0.47%  market   0.38%  airlines   0.62%      film   0.51%   yahoo   0.58%    software\n",
      "4   0.46%  prices   0.38%     state   0.53%  minister   0.51%  labour   0.55%  technology\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAz4AAAIWCAYAAACWQ1g3AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3W9snfV99/GPYzcg4hJin5I0S1ImByrYlqZpkrWMFhrcJ/tDcw8G7saeoBUyqla0MCWhC3S4KWYQZRNlE7Qp6iiCDZZQNLFKsVBaAa0gTKFbEaCkGhU0JDjGyVz+NIl9P5hu32N2GsfHuU78y+v1yNc5l/37YX8T8c51neOm4eHh4QAAABRsWqM3AAAAcLwJHwAAoHjCBwAAKJ7wAQAAiid8AACA4gkfAACgeC2T8UV27NiRe++9N0NDQ7n44ouzcuXKdz1/8ODBfP3rX89Pf/rTvPe97811112XM888czKWBgAAOKq6r/gMDQ1l06ZNufHGG7Nx48Y8+eSTeeWVV951zuOPP54ZM2bkzjvvzO/93u/l/vvvr3dZAACAcas7fHbu3Jk5c+Zk9uzZaWlpyfnnn59nnnnmXeds3749F110UZLkox/9aP7jP/4jfm8qAABQlbrDp7+/P+3t7SPH7e3t6e/vP+I5zc3NOe200/Jf//Vf9S4NAAAwLnW/xmesKzdNTU3HfM7/09vbm97e3iRJT09PvdsDAACoP3za29uzb9++keN9+/Zl1qxZY57T3t6ew4cP580330xra+uYX6+zszOdnZ0jxz//+c/r3eJxcfizlzR6C1NS8zcebfQWpiTzNjHmbWLM28SYt4kxbxNj3ibOzE3MiTxzc+fOHdd5dd/q1tHRkd27d2fv3r05dOhQnnrqqSxduvRd53zkIx/Jtm3bkiQ/+tGP8hu/8RtHvOIDAAAw2eq+4tPc3Jyrrroq69evz9DQUD75yU9m/vz5+cd//Md0dHRk6dKlWbFiRb7+9a/n85//fFpbW3PddddNxt4BAADGZVJ+j8+SJUuyZMmSdz12xRVXjHw8ffr0fOlLX5qMpQAAAI5Z3be6AQAAnOiEDwAAUDzhAwAAFE/4AAAAxRM+AABA8YQPAABQPOEDAAAUT/gAAADFEz4AAEDxhA8AAFA84QMAABRP+AAAAMUTPgAAQPGEDwAAUDzhAwAAFE/4AAAAxRM+AABA8YQPAABQPOEDAAAUT/gAAADFEz4AAEDxhA8AAFA84QMAABRP+AAAAMUTPgAAQPGEDwAAUDzhAwAAFE/4AAAAxRM+AABA8YQPAABQPOEDAAAUT/gAAADFEz4AAEDxhA8AAFA84QMAABRP+AAAAMUTPgAAQPGEDwAAUDzhAwAAFE/4AAAAxRM+AABA8YQPAABQPOEDAAAUT/gAAADFEz4AAEDxhA8AAFA84QMAABRP+AAAAMUTPgAAQPGEDwAAULyWej55cHAwGzduzOuvv573ve99+eIXv5jW1tZR511xxRVZsGBBkqRWq2X16tX1LAsAAHBM6gqfRx55JL/1W7+VlStX5pFHHskjjzySK6+8ctR506dPz+23317PUgAAABNW161uzzzzTC688MIkyYUXXphnnnlmUjYFAAAwmeq64rN///7MmjUrSTJr1qwcOHBgzPMOHjyYNWvWpLm5OZ/+9KezfPnyepYFAAA4JkcNn+7u7gwMDIx6vKura9yL/N3f/V3a2tqyZ8+e3HLLLVmwYEHmzJkz5rm9vb3p7e1NkvT09KRWq417nSrtafQGpqgT9ed5ojNvE2PeJsa8TYx5mxjzNjHmbeLM3MSUMHNHDZ9169Yd8bmZM2fmjTfeyKxZs/LGG2/k9NNPH/O8tra2JMns2bNz3nnn5T//8z+PGD6dnZ3p7OwcOe7r6zvaFplC/DypknmjSuaNKpk3qnYiz9zcuXPHdV5dr/FZunRpvv/97ydJvv/972fZsmWjzhkcHMzBgweTJAcOHMiLL76YefPm1bMsAADAManrNT4rV67Mxo0b8/jjj6dWq+VLX/pSkmTXrl3ZunVrVq1alVdffTX33HNPpk2blqGhoaxcuVL4AAAAlaorfN773vfmpptuGvV4R0dHOjo6kiQf/OAHs2HDhnqWAQAAqEtdt7oBAABMBcIHAAAonvABAACKJ3wAAIDiCR8AAKB4wgcAACie8AEAAIonfAAAgOIJHwAAoHjCBwAAKJ7wAQAAiid8AACA4gkfAACgeMIHAAAonvABAACKJ3wAAIDiCR8AAKB4wgcAACie8AEAAIonfAAAgOIJHwAAoHjCBwAAKJ7wAQAAiid8AACA4gkfAACgeMIHAAAonvABAACKJ3wAAIDiCR8AAKB4wgcAACie8AEAAIonfAAAgOIJHwAAoHjCBwAAKJ7wAQAAiid8AACA4gkfAACgeMIHAAAonvABAACKJ3wAAIDiCR8AAKB4wgcAACie8AEAAIonfAAAgOIJHwAAoHjCBwAAKJ7wAQAAiid8AACA4gkfAACgeMIHAAAoXks9n/zDH/4wDz30UF599dV87WtfS0dHx5jn7dixI/fee2+GhoZy8cUXZ+XKlfUsCwAAcEzquuIzf/783HDDDTn33HOPeM7Q0FA2bdqUG2+8MRs3bsyTTz6ZV155pZ5lAQAAjkldV3zmzZt31HN27tyZOXPmZPbs2UmS888/P88888y4PhcAAGAyHPfX+PT396e9vX3kuL29Pf39/cd7WQAAgBFHveLT3d2dgYGBUY93dXVl2bJlR11geHh41GNNTU1HPL+3tze9vb1Jkp6entRqtaOu0Qh7Gr2BKepE/Xme6MzbxJi3iTFvE2PeJsa8TYx5mzgzNzElzNxRw2fdunV1LdDe3p59+/aNHO/bty+zZs064vmdnZ3p7OwcOe7r66trfU4sfp5UybxRJfNGlcwbVTuRZ27u3LnjOu+43+rW0dGR3bt3Z+/evTl06FCeeuqpLF269HgvCwAAMKKu8Hn66aezatWqvPTSS+np6cn69euT/Pfrem699dYkSXNzc6666qqsX78+X/ziF/Oxj30s8+fPr3/nAAAA41TXu7otX748y5cvH/V4W1tb1q5dO3K8ZMmSLFmypJ6lAAAAJuy43+oGAADQaMIHAAAonvABAACKJ3wAAIDiCR8AAKB4wgcAACie8AEAAIonfAAAgOIJHwAAoHjCBwAAKJ7wAQAAiid8AACA4gkfAACgeMIHAAAonvABAACKJ3wAAIDiCR8AAKB4wgcAACie8AEAAIonfAAAgOIJHwAAoHjCBwAAKJ7wAQAAiid8AACA4gkfAACgeMIHAAAonvABAACKJ3wAAIDiCR8AAKB4wgcAACie8AEAAIonfAAAgOIJHwAAoHjCBwAAKJ7wAQAAiid8AACA4gkfAACgeMIHAAAonvABAACKJ3wAAIDiCR8AAKB4wgcAACie8AEAAIonfAAAgOIJHwAAoHjCBwAAKJ7wAQAAiid8AACA4gkfAACgeMIHAAAoXks9n/zDH/4wDz30UF599dV87WtfS0dHx5jnfe5zn8upp56aadOmpbm5OT09PfUsCwAAcEzqCp/58+fnhhtuyD333HPUc2+++eacfvrp9SwHAAAwIXWFz7x58yZrHwAAAMdNXeFzLNavX58k+dSnPpXOzs6qlgUAADh6+HR3d2dgYGDU411dXVm2bNm4Funu7k5bW1v279+fr371q5k7d27OO++8Mc/t7e1Nb29vkqSnpye1Wm1ca1RtT6M3MEWdqD/PE515mxjzNjHmbWLM28SYt4kxbxNn5iamhJk7avisW7eu7kXa2tqSJDNnzsyyZcuyc+fOI4ZPZ2fnu64I9fX11b0+Jw4/T6pk3qiSeaNK5o2qncgzN3fu3HGdd9zfzvrtt9/OW2+9NfLxj3/84yxYsOB4LwsAADCirtf4PP300/nWt76VAwcOpKenJ2eddVa+/OUvp7+/P3fffXfWrl2b/fv354477kiSHD58OBdccEEWL148KZsHAAAYj7rCZ/ny5Vm+fPmox9va2rJ27dokyezZs3P77bfXswwAAEBdjvutbgAAAI0mfAAAgOIJHwAAoHjCBwAAKJ7wAQAAiid8AACA4gkfAACgeMIHAAAonvABAACKJ3wAAIDiCR8AAKB4wgcAACie8AEAAIonfAAAgOIJHwAAoHjCBwAAKJ7wAQAAiid8AACA4gkfAACgeMIHAAAonvABAACKJ3wAAIDiCR8AAKB4wgcAACie8AEAAIonfAAAgOIJHwAAoHjCBwAAKJ7wAQAAiid8AACA4gkfAACgeMIHAAAonvABAACKJ3wAAIDiCR8AAKB4wgcAACie8AEAAIonfAAAgOIJHwAAoHjCBwAAKJ7wAQAAiid8AACA4gkfAACgeMIHAAAoXkujNzBVNX/j0UZv4YhqtVr6+voavQ0AADhhuOIDAAAUT/gAAADFEz4AAEDxhA8AAFA84QMAABSvrnd1u++++/Lss8+mpaUls2fPzrXXXpsZM2aMOm/Hjh259957MzQ0lIsvvjgrV66sZ1kAAIBjUtcVn0WLFmXDhg2544478v73vz9btmwZdc7Q0FA2bdqUG2+8MRs3bsyTTz6ZV155pZ5lAQAAjkld4fOhD30ozc3NSZJzzjkn/f39o87ZuXNn5syZk9mzZ6elpSXnn39+nnnmmXqWBQAAOCaT9hqfxx9/PIsXLx71eH9/f9rb20eO29vbxwwkAACA4+Wor/Hp7u7OwMDAqMe7urqybNmyJMnmzZvT3Nycj3/846POGx4eHvVYU1PTEdfr7e1Nb29vkqSnpye1Wu1oW+R/aWlp8X0rzJ5Gb2CK8udgYszbxJi3iTFvE2PeJs7MTUwJM3fU8Fm3bt2vfH7btm159tlnc9NNN40ZNO3t7dm3b9/I8b59+zJr1qwjfr3Ozs50dnaOHPf19R1ti/wvtVrN9w3i7w+qZd6oknmjaifyzM2dO3dc59V1q9uOHTvy3e9+N6tXr84pp5wy5jkdHR3ZvXt39u7dm0OHDuWpp57K0qVL61kWAADgmNT1dtabNm3KoUOH0t3dnSQ5++yzc/XVV6e/vz9333131q5dm+bm5lx11VVZv359hoaG8slPfjLz58+flM0DAACMR13hc+edd475eFtbW9auXTtyvGTJkixZsqSepQAAACZs0t7VDQAA4EQlfAAAgOIJHwAAoHjCBwAAKJ7wAQAAiid8AACA4gkfAACgeMIHAAAonvABAACKJ3wAAIDiCR8AAKB4wgcAACie8AEAAIonfAAAgOIJHwAAoHjCBwAAKJ7wAQAAiid8AACA4gkfAACgeMIHAAAonvABAACKJ3wAAIDiCR8AAKB4wgcAACie8AEAAIonfAAAgOIJHwAAoHjCBwAAKJ7wAQAAiid8AACA4gkfAACgeMIHAAAonvABAACKJ3wAAIDiCR8AAKB4wgcAACie8AEAAIonfAAAgOIJHwAAoHjCBwAAKJ7wAQAAiid8AACA4gkfAACgeMIHAAAonvABAACKJ3wAAIDiCR8AAKB4wgcAACie8AEAAIonfAAAgOK11PPJ9913X5599tm0tLRk9uzZufbaazNjxoxR533uc5/LqaeemmnTpqW5uTk9PT31LAsAAHBM6gqfRYsW5Y//+I/T3Nyc73znO9myZUuuvPLKMc+9+eabc/rpp9ezHAAAwITUdavbhz70oTQ3NydJzjnnnPT390/KpgAAACZTXVd8/qfHH388559//hGfX79+fZLkU5/6VDo7OydrWQAAgKM6avh0d3dnYGBg1ONdXV1ZtmxZkmTz5s1pbm7Oxz/+8SN+jba2tuzfvz9f/epXM3fu3Jx33nljntvb25ve3t4kSU9PT2q12rj/Y/hvLS0tvm+F2dPoDUxR/hxMjHmbGPM2MeZtYszbxJm5iSlh5o4aPuvWrfuVz2/bti3PPvtsbrrppjQ1NY15TltbW5Jk5syZWbZsWXbu3HnE8Ons7HzXFaG+vr6jbZH/pVar+b5B/P1BtcwbVTJvVO1Enrm5c+eO67y6XuOzY8eOfPe7383q1atzyimnjHnO22+/nbfeemvk4x//+MdZsGBBPcsCAAAck7pe47Np06YcOnQo3d3dSZKzzz47V199dfr7+3P33Xdn7dq12b9/f+64444kyeHDh3PBBRdk8eLF9e8cAABgnOoKnzvvvHPMx9va2rJ27dokyezZs3P77bfXswyc9Jq/8Wijt3BEbq0EAKaCum51AwAAmAqEDwAAUDzhAwAAFE/4AAAAxRM+AABA8YQPAABQPOEDAAAUT/gAAADFEz4AAEDxhA8AAFA84QMAABRP+AAAAMUTPgAAQPFaGr0BAE4szd94tNFbOKJarZa+vr5GbwOAKcgVHwAAoHjCBwAAKJ7wAQAAiid8AACA4gkfAACgeMIHAAAonvABAACKJ3wAAIDiCR8AAKB4wgcAACie8AEAAIonfAAAgOIJHwAAoHjCBwAAKJ7wAQAAiid8AACA4gkfAACgeMIHAAAonvABAACKJ3wAAIDiCR8AAKB4wgcAACie8AEAAIonfAAAgOIJHwAAoHjCBwAAKJ7wAQAAiid8AACA4gkfAACgeMIHAAAonvABAACKJ3wAAIDiCR8AAKB4wgcAACie8AEAAIonfAAAgOK11PsFHnzwwWzfvj1NTU2ZOXNmrr322rS1tY06b9u2bdm8eXOS5A//8A9z0UUX1bs0AADAuNQdPpdcckm6urqSJI899lgefvjhXH311e86Z3BwMA8//HB6enqSJGvWrMnSpUvT2tpa7/IAAABHVfetbqeddtrIx++8806amppGnbNjx44sWrQora2taW1tzaJFi7Jjx456lwYAABiXuq/4JMkDDzyQH/zgBznttNNy8803j3q+v78/7e3tI8dtbW3p7++fjKUBAACOalzh093dnYGBgVGPd3V1ZdmyZfnMZz6Tz3zmM9myZUu+973v5fLLLz/q1xzrylCS9Pb2pre3N0nS09OTWq02ni3yP7S0tPi+URnzRpXMW3n2NHoDU5Q/BxNn5iamhJkbV/isW7duXF/sggsuSE9Pz6jwaWtry/PPPz9y3N/fn/POO2/Mr9HZ2ZnOzs6R476+vnGtzf9Xq9V836iMeaNK5g3+mz8HVO1Enrm5c+eO67y6X+Oze/fukY+3b98+5sKLFy/Oc889l8HBwQwODua5557L4sWL610aAABgXOp+jc/999+f3bt3p6mpKbVabeQd3Xbt2pWtW7dm1apVaW1tzaWXXpq1a9cmSS677DLv6AYAAFSm7vC54YYbxny8o6MjHR0dI8crVqzIihUr6l0OAADgmNV9qxsAAMCJTvgAAADFEz4AAEDxhA8AAFA84QMAABRP+AAAAMUTPgAAQPGEDwAAUDzhAwAAFE/4AAAAxRM+AABA8YQPAABQPOEDAAAUT/gAAADFEz4AAEDxhA8AAFA84QMAABSvpdEbAABOXs3feLTRWziiWq2Wvr6+Rm8DmCSu+AAAAMUTPgAAQPGEDwAAUDzhAwAAFE/4AAAAxRM+AABA8YQPAABQPOEDAAAUT/gAAADFEz4AAEDxhA8AAFA84QMAABRP+AAAAMUTPgAAQPGEDwAAUDzhAwAAFE/4AAAAxRM+AABA8YQPAABQPOEDAAAUT/gAAADFEz4AAEDxhA8AAFA84QMAABRP+AAAAMUTPgAAQPGEDwAAUDzhAwAAFE/4AAAAxRM+AABA8YQPAABQPOEDAAAUr6WeT37wwQezffv2NDU1ZebMmbn22mvT1tY26rwrrrgiCxYsSJLUarWsXr26nmUBAACOSV3hc8kll6SrqytJ8thjj+Xhhx/O1VdfPeq86dOn5/bbb69nKQAAgAmr61a30047beTjd955J01NTXVvCAAAYLLVdcUnSR544IH84Ac/yGmnnZabb755zHMOHjyYNWvWpLm5OZ/+9KezfPnyepcFAAAYt6bh4eHhX3VCd3d3BgYGRj3e1dWVZcuWjRxv2bIlBw8ezOWXXz7q3P7+/rS1tWXPnj255ZZbsm7dusyZM2fM9Xp7e9Pb25sk6enpyS9/+ctj+g8iaWlpyaFDhxq9DU4S5o0qmTeqZN7KtOf/nN/oLUxJs7c81egtHNH06dPHdd5Rw2e8Xn/99fT09GTDhg2/8ry77rorH/nIR/LRj350XF/35z//+WRs76RSq9XS19fX6G1wkjBvVMm8USXzVqbDn72k0VuYkpq/8Wijt3BEc+fOHdd5db3GZ/fu3SMfb9++fcxFBwcHc/DgwSTJgQMH8uKLL2bevHn1LAsAAHBM6nqNz/3335/du3enqakptVpt5B3ddu3ala1bt2bVqlV59dVXc88992TatGkZGhrKypUrhQ8AAFCpSbvV7Xhxq9uxc2meKpk3qmTeqJJ5K5Nb3SbmpL/VDQAAYCoQPgAAQPGEDwAAUDzhAwAAFE/4AAAAxRM+AABA8YQPAABQPOEDAAAUT/gAAADFEz4AAEDxhA8AAFA84QMAABRP+AAAAMUTPgAAQPGEDwAAUDzhAwAAFE/4AAAAxRM+AABA8YQPAABQPOEDAAAUT/gAAADFEz4AAEDxhA8AAFA84QMAABRP+AAAAMUTPgAAQPGEDwAAUDzhAwAAFE/4AAAAxRM+AABA8YQPAABQPOEDAAAUr6XRGwAAgKo0f+PRRm/hiGq1Wvr6+hq9jWK54gMAABRP+AAAAMUTPgAAQPGEDwAAUDzhAwAAFE/4AAAAxRM+AABA8YQPAABQPOEDAAAUT/gAAADFEz4AAEDxhA8AAFA84QMAABRP+AAAAMUTPgAAQPGEDwAAUDzhAwAAFG/SwufRRx/N5ZdfngMHDoz5/LZt2/KFL3whX/jCF7Jt27bJWhYAAOCoWibji/T19eXf//3fU6vVxnx+cHAwDz/8cHp6epIka9asydKlS9Pa2joZywMAAPxKk3LF59vf/nb+5E/+JE1NTWM+v2PHjixatCitra1pbW3NokWLsmPHjslYGgAA4KjqDp/t27enra0tZ5111hHP6e/vT3t7+8hxW1tb+vv7610aAABgXMZ1q1t3d3cGBgZGPd7V1ZUtW7bkL//yL4954SNdHert7U1vb2+SpKen54i3z3FkLS0tvm9UxrxRJfNGlcwbVTNzx1fT8PDw8EQ/+Wc/+1luueWWnHLKKUmSffv2ZdasWbn11ltzxhlnjJz3xBNP5Pnnn8/VV1+dJLnnnnty3nnn5YILLqhz+wAAAEdX161uCxYsyDe/+c3cddddueuuu9Le3p7bbrvtXdGTJIsXL85zzz2XwcHBDA4O5rnnnsvixYvr2jhHtmbNmkZvgZOIeaNK5o0qmTeqZuaOr0l5V7ex7Nq1K1u3bs2qVavS2tqaSy+9NGvXrk2SXHbZZd7RDQAAqMykhs9dd9018nFHR0c6OjpGjlesWJEVK1ZM5nIAAADjMmm/wJQTR2dnZ6O3wEnEvFEl80aVzBtVM3PHV11vbgAAADAVuOIDAAAUT/gAAADFEz6F2r17d6O3QKGGhoZGPTY4ONiAnXAy+H93Yx8+fDgvv/xy3nzzzQbviJPF1q1bG70FTiLvvPNOXn755bz11luN3krRjtvbWdNYt9xyS/7+7/++0dugIM8//3zuvPPOvPPOOzn77LPz2c9+duS3S3d3d+e2225r8A4pyfbt23P33Xenqakp11xzTTZv3pyWlpa89tprueaaa7JkyZJGb5GCPPbYY6Me++d//uccPHgwSfK7v/u7VW+Jwn3rW9/KVVddlSR56aWXsnHjxrzvfe/L3r17s2rVKr/v8jgRPlPYt7/97TEfHx4e9q+iTLr77rsva9asyYIFC/LUU0+lu7s7n//857Nw4cJ4jxQm2z/90z/lr//6r/POO+9k9erVWb9+febNm5e9e/dm48aNwodJ9cADD+TDH/5wfu3Xfm3k77OhoaEcOHCgwTujVC+++OLIxw888ECuv/76LFy4MK+99lr+9m//VvgcJ8JnCuvt7c2VV16Z97znPaOea2nxo2VyHTp0KB/4wAeSJL/zO7+T+fPnZ8OGDfnTP/3TNDU1NXh3lGjWrFlJklqtlnnz5iVJzjzzzDFvt4R6bNiwIf/wD/+QoaGhXHrppZk+fXqeeOKJdHV1NXprnATefPPNLFy4MEkyZ84cf8cdR/7veApbuHBhfv3Xfz3nnHPOqOceeuihBuyIkk2bNi0DAwM544wzkiQLFizIunXr0tPTk9dff73Bu6M0w8PDGRoayrRp03LNNdeMPD40NJRDhw41cGeU6Mwzz8wNN9yQH/3oR+nu7s4f/MEfNHpLFO7VV1/N6tWrMzw8nD179uQXv/hFZsyY4e+448zv8ZnCDhw4kOnTp+fUU09t9FY4CezYsSNnnHFGzjrrrHc9Pjg4mH/913/NH/3RHzVmYxTppZdeyllnnZXp06e/6/G9e/fm+eefz0UXXdSYjVG8t99+Ow8++GB27dqV7u7uRm+HQr322mvvOq7VamlpacmBAwfyk5/8JB/72McatLOyCR8AAKB43s4aAAAonvABAACKJ3wAAIDiCZ8CrF+/Pr/4xS9GjgcHB3Prrbc2cEeUzLxRJfNGlcwbVTNz1RI+Bdi/f39mzJgxctza2po33nijgTuiZOaNKpk3qmTeqJqZq5bwKUBTU1P27ds3ctzX19fA3VA680aVzBtVMm9UzcxVy9tZF+Df/u3f8s1vfjO/+Zu/mST5yU9+kj/7sz/Lhz/84QbvjBKZN6pk3qiSeaNqZq5awqcQAwMDeemll5IkH/zgBzNz5swG74iSmTeqZN6oknmjamauOsJnCtu9e3fe//735+WXXx7z+Q984AMV74iSmTeqZN6oknmjamauMVoavQEm7pFHHsmf//mfZ9OmTaOea2pqyl/91V81YFeUyrxRJfNGlcwbVTNzjeGKDwAAUDxXfApw8ODBbN26NS+88EKamppy7rnn5uKLL8573vOeRm+NApk3qmTeqJJ5o2pmrlqu+BTgb/7mb9LS0pJPfOITSZInnngiv/zlL3Pdddc1eGeUyLxRJfNGlcwbVTNz1XLFpwCvvPJK7rjjjpHjRYsW5S+edYMcAAACZklEQVT+4i8auCNKZt6oknmjSuaNqpm5avkFpgU466yzsnPnzpHjn/70pzn77LMbuCNKZt6oknmjSuaNqpm5arnVrQDXX399XnnllZx55plJkr1792b+/PmZNm1ampqacttttzV4h5TEvFEl80aVzBtVM3PVEj4FeO21137l83PmzKloJ5wMzBtVMm9UybxRNTNXLeFTiJ/97Gd54YUXkiTnnntu5s+f3+AdUTLzRpXMG1Uyb1TNzFWn+Stf+cpXGr0J6vO9730v3/nOdzJr1qy8+eab2bx5c5Jk4cKFDd4ZJTJvVMm8USXzRtXMXMWGmfKuv/764bfeemvk+K233hq+/vrrG7gjSmbeqJJ5o0rmjaqZuWp5V7cCDA8Pp7m5eeS4ubk5w+5g5Dgxb1TJvFEl80bVzFy1/B6fKezw4cNpbm7OJz7xiXz5y1/Ob//2bydJnn766Vx44YUN3h2lMW9UybxRJfNG1cxcY3hzgyls9erVI29zuHPnzrzwwgsZHh7Oueee695QJp15o0rmjSqZN6pm5hrDFZ8p7H8268KFC/1B4bgyb1TJvFEl80bVzFxjCJ8p7MCBA/mXf/mXIz7/+7//+xXuhtKZN6pk3qiSeaNqZq4xhM8UNjQ0lLffftuL4KiEeaNK5o0qmTeqZuYaQ/hMYbNmzcpll13W6G1wkjBvVMm8USXzRtXMXGN4O+spzL8SUCXzRpXMG1Uyb1TNzDWGd3WbwgYHB9Pa2trobXCSMG9UybxRJfNG1cxcYwgfAACgeG51AwAAiid8AACA4gkfAACgeMIHAAAonvABAACK938BA24pB5lkfPoAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 1008x626.4 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "topic_coherence = []\n",
    "topic_words = pd.DataFrame()\n",
    "for t in range(len(coherence)):\n",
    "    label = topic_labels[t]\n",
    "    topic_coherence.append(coherence[t][1])\n",
    "    df = pd.DataFrame(coherence[t][0], columns=[(label, 'prob'), (label, 'term')])\n",
    "    df[(label, 'prob')] = df[(label, 'prob')].apply(lambda x: '{:.2%}'.format(x))\n",
    "    topic_words = pd.concat([topic_words, df], axis=1)\n",
    "                      \n",
    "topic_words.columns = pd.MultiIndex.from_tuples(topic_words.columns)\n",
    "pd.set_option('expand_frame_repr', False)\n",
    "topic_words.head().to_csv('topic_words.csv', index=False)\n",
    "print(topic_words.head())\n",
    "\n",
    "pd.Series(topic_coherence, index=topic_labels).plot.bar();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Using `gensim` `Dictionary` "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T05:04:28.532936Z",
     "start_time": "2018-05-01T05:02:39.320Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "docs = [d.split() for d in train_docs.article.tolist()]\n",
    "docs = [[t for t in doc if t not in stop_words] for doc in docs]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T05:04:28.533820Z",
     "start_time": "2018-05-01T05:02:39.496Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "dictionary = Dictionary(docs)\n",
    "dictionary.filter_extremes(no_below=min_df, no_above=max_df, keep_n=max_features)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T05:04:28.534812Z",
     "start_time": "2018-05-01T05:02:39.648Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "corpus = [dictionary.doc2bow(doc) for doc in docs]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T05:04:28.535717Z",
     "start_time": "2018-05-01T05:02:39.825Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of unique tokens: 2000\n",
      "Number of documents: 2175\n"
     ]
    }
   ],
   "source": [
    "print('Number of unique tokens: %d' % len(dictionary))\n",
    "print('Number of documents: %d' % len(corpus))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T05:04:28.536760Z",
     "start_time": "2018-05-01T05:02:42.816Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "num_topics = 5\n",
    "chunksize = 500\n",
    "passes = 20\n",
    "iterations = 400\n",
    "eval_every = None # Don't evaluate model perplexity, takes too much time.\n",
    "\n",
    "temp = dictionary[0]  # This is only to \"load\" the dictionary.\n",
    "id2word = dictionary.id2token"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T05:04:28.537677Z",
     "start_time": "2018-05-01T05:02:45.832Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "model = LdaModel(corpus=corpus,\n",
    "                 id2word=id2word,\n",
    "                 chunksize=chunksize,\n",
    "                 alpha='auto',\n",
    "                 eta='auto',\n",
    "                 iterations=iterations,\n",
    "                 num_topics=num_topics,\n",
    "                 passes=passes, \n",
    "                 eval_every=eval_every)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T05:04:28.538730Z",
     "start_time": "2018-05-01T05:02:46.967Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(0,\n",
       "  '0.007*\"company\" + 0.007*\"growth\" + 0.006*\"market\" + 0.006*\"economic\" + 0.006*\"oil\" + 0.006*\"sales\" + 0.005*\"firm\" + 0.005*\"rise\" + 0.005*\"economy\" + 0.005*\"prices\"'),\n",
       " (1,\n",
       "  '0.010*\"technology\" + 0.009*\"mobile\" + 0.008*\"use\" + 0.008*\"digital\" + 0.007*\"music\" + 0.007*\"games\" + 0.006*\"users\" + 0.006*\"used\" + 0.006*\"software\" + 0.006*\"net\"'),\n",
       " (2,\n",
       "  '0.012*\"Labour\" + 0.011*\"government\" + 0.009*\"Blair\" + 0.007*\"election\" + 0.006*\"public\" + 0.006*\"party\" + 0.006*\"Brown\" + 0.005*\"say\" + 0.005*\"Howard\" + 0.005*\"minister\"'),\n",
       " (3,\n",
       "  '0.009*\"game\" + 0.008*\"win\" + 0.008*\"England\" + 0.007*\"good\" + 0.006*\"think\" + 0.006*\"play\" + 0.005*\"players\" + 0.005*\"got\" + 0.005*\"And\" + 0.005*\"it\\'s\"'),\n",
       " (4,\n",
       "  '0.024*\"best\" + 0.021*\"film\" + 0.012*\"won\" + 0.009*\"music\" + 0.008*\"British\" + 0.008*\"TV\" + 0.007*\"including\" + 0.007*\"director\" + 0.007*\"UK\" + 0.007*\"star\"')]"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model.show_topics()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Evaluating Topic Assignments on the Test Set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T05:04:28.539924Z",
     "start_time": "2018-05-01T05:02:50.153Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "docs_test = [d.split() for d in test_docs.article.tolist()]\n",
    "docs_test = [[t for t in doc if t not in stop_words] for doc in docs_test]\n",
    "\n",
    "test_dictionary = Dictionary(docs_test)\n",
    "test_dictionary.filter_extremes(no_below=min_df, no_above=max_df, keep_n=max_features)\n",
    "test_corpus = [dictionary.doc2bow(doc) for doc in docs_test]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T05:04:28.541193Z",
     "start_time": "2018-05-01T05:02:50.336Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.11</td>\n",
       "      <td>0.07</td>\n",
       "      <td>0.09</td>\n",
       "      <td>2.81</td>\n",
       "      <td>67.32</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>6.82</td>\n",
       "      <td>60.50</td>\n",
       "      <td>27.93</td>\n",
       "      <td>0.10</td>\n",
       "      <td>0.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.11</td>\n",
       "      <td>32.94</td>\n",
       "      <td>0.09</td>\n",
       "      <td>51.46</td>\n",
       "      <td>6.79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>61.13</td>\n",
       "      <td>0.07</td>\n",
       "      <td>32.06</td>\n",
       "      <td>0.10</td>\n",
       "      <td>0.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.11</td>\n",
       "      <td>0.07</td>\n",
       "      <td>0.09</td>\n",
       "      <td>115.79</td>\n",
       "      <td>4.33</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>63.55</td>\n",
       "      <td>0.07</td>\n",
       "      <td>32.64</td>\n",
       "      <td>0.10</td>\n",
       "      <td>0.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>42.69</td>\n",
       "      <td>0.07</td>\n",
       "      <td>0.09</td>\n",
       "      <td>2.51</td>\n",
       "      <td>0.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.11</td>\n",
       "      <td>0.07</td>\n",
       "      <td>26.56</td>\n",
       "      <td>22.62</td>\n",
       "      <td>0.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>103.20</td>\n",
       "      <td>0.07</td>\n",
       "      <td>26.73</td>\n",
       "      <td>0.10</td>\n",
       "      <td>6.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>54.08</td>\n",
       "      <td>0.07</td>\n",
       "      <td>0.09</td>\n",
       "      <td>0.10</td>\n",
       "      <td>7.07</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       0     1     2      3     4\n",
       "0   0.11  0.07  0.09   2.81 67.32\n",
       "1   6.82 60.50 27.93   0.10  0.05\n",
       "2   0.11 32.94  0.09  51.46  6.79\n",
       "3  61.13  0.07 32.06   0.10  0.05\n",
       "4   0.11  0.07  0.09 115.79  4.33\n",
       "5  63.55  0.07 32.64   0.10  0.05\n",
       "6  42.69  0.07  0.09   2.51  0.05\n",
       "7   0.11  0.07 26.56  22.62  0.05\n",
       "8 103.20  0.07 26.73   0.10  6.29\n",
       "9  54.08  0.07  0.09   0.10  7.07"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gamma, _ = model.inference(test_corpus)\n",
    "topic_scores = pd.DataFrame(gamma)\n",
    "topic_scores.head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T05:04:28.542544Z",
     "start_time": "2018-05-01T05:02:50.479Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.96</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.07</td>\n",
       "      <td>0.63</td>\n",
       "      <td>0.29</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.00</td>\n",
       "      <td>0.36</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.56</td>\n",
       "      <td>0.07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.65</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.34</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.96</td>\n",
       "      <td>0.04</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     0    1    2    3    4\n",
       "0 0.00 0.00 0.00 0.04 0.96\n",
       "1 0.07 0.63 0.29 0.00 0.00\n",
       "2 0.00 0.36 0.00 0.56 0.07\n",
       "3 0.65 0.00 0.34 0.00 0.00\n",
       "4 0.00 0.00 0.00 0.96 0.04"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "topic_probabilities = topic_scores.div(topic_scores.sum(axis=1), axis=0)\n",
    "topic_probabilities.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T05:04:28.544253Z",
     "start_time": "2018-05-01T05:02:50.631Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    4\n",
       "1    1\n",
       "2    3\n",
       "3    0\n",
       "4    3\n",
       "dtype: int64"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "topic_probabilities.idxmax(axis=1).head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-05-01T05:04:28.545304Z",
     "start_time": "2018-05-01T05:02:52.185Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAvMAAAIMCAYAAAB4wSMbAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xl8VPW5x/HvmZCQhLBkAZWtGIFCBBQIslgQJW6IolhQEG+Aq0Vwqdgq4Eat1YIWlSUsBQGvdUUFlYJcYxWuwrUgoKxhjRUChZCEkBIIs9w/vB2JREkw53cy53zevuZ1k5PMPM+Qa33y5ZnfWKFQKCQAAAAAEcfndAMAAAAAzg7DPAAAABChGOYBAACACMUwDwAAAEQohnkAAAAgQjHMAwAAABGKYR4AAACIUAzzAAAAQIRimAcAAAAiVC2nG6iMuOumOt0CHLBrzi+dbgEGJTVq7HQLcEDBwTynWwBgo8aNa+7/tsd1vMfWxy9dP93Wx/83knkAAAAgQkVEMg8AAABUK8sdmbY7ngUAAADgQSTzAAAA8B7LcrqDakEyDwAAAEQoknkAAAB4DzvzAAAAAJxEMg8AAADvYWceAAAAgJNI5gEAAOA9LtmZZ5gHAACA97BmAwAAAMBJJPMAAADwHpes2bjjWQAAAAAeRDIPAAAA72FnHgAAAICTSOYBAADgPezMAwAAAHASyTwAAAC8h515AAAAAE4imQcAAID3sDMPAAAAwEkk8wAAAPAel+zMM8wDAADAe1izAQAAAOAkknkAAAB4D8k8AAAAACeRzAMAAMB7fO54ASzJPAAAABChSOYBAADgPezMAwAAAHASyTwAAAC8xyVvGkUyDwAAAEQoknkAAAB4j0t25hnmAQAA4D2s2QAAAABwEsk8AAAAvMclazbueBYAAACAB5HMAwAAwHvYmQcAAADgJJJ5AAAAeA878wAAAACcRDJfg8z6dR9de8n5OlRUqvS7X5EkJSbU1svjrtXPGtXT1weLNXTiMhWVnHC4UwDV4fFHx2vlik+UlJSsd95d4nQ7AOAt7Myjur2cvVX9H3+33LXfDkzXJ19+o/a/+i998uU3+u3Azg51B6C69b9xgGbOnut0GwAAB8yYMUN33HGHfvOb34SvlZSU6Mknn9R9992nJ598UiUlJWd8HIb5GuSzzXkqOHq83LV+3VL1l+ytkqS/ZG/V9d0ucKI1ADbonN5F9erXd7oNAPAmy2fv7Qx69+6thx9+uNy1xYsXq3379po6darat2+vxYsXn/FxjA3zS5cu1bFjxxQKhTRz5kyNHTtWX375panyEatRg3gdKDwmSTpQeEwNG8Q53BEAAAB+qrS0NCUkJJS7tmbNGl122WWSpMsuu0xr1qw54+MY25n/+OOP1bdvX23YsEHFxcUaNWqUZs6cqYsuuqjC78/OzlZ2dvb/f9bYVJsAAADwApt35svPslJGRoYyMjJ+9D5HjhxRYmKiJCkxMVHFxcVnrGNsmA+FQpKk9evX6/LLL1eLFi3C1ypy6hOect1UIz3WRAeLjuncxG/T+XMT43WoqNTplgAAACKfzUdTVmZ4rw7G1mxSU1P1hz/8QevXr9dFF12k0tJSWS55FbGd/vr5bg3NaCtJGprRVkv+d7fDHQEAAMAO9evXV2FhoSSpsLBQ9erVO+N9jCXzd911l3Jzc3XOOeeodu3aKikp0ejRo02VjwgvPXS1erZvqpR6sdr50gg9+cr/6k8Lv9Bfxl2rzCsv1DeHjuq2Py51uk0A1WTsbx/Q2jV/V1FRoa68opdG3X2vBtw80Om2AMAbauCbRqWnp2vFihW68cYbtWLFCnXp0uWM97FCP7brUo22bdumFi1aKDY2VitXrtSePXvUt29fNWzY8Iz3jfPwmo2X7ZrzS6dbgEFJjXhtjBcVHMxzugUANmrcuOb+b3vc9TNsffzS9388tH7hhRe0ZcsWHT16VPXr19egQYPUpUsXPf/888rPz1dKSooeeOCB014k+33Gkvm5c+fq2WefVW5urt577z1dccUVmj59up544glTLQAAAADfcnjd+/7776/w+uOPP16lxzH29wtRUVGyLEtr165V37591bdvXx0/fvzMdwQAAABQIWPDfGxsrBYtWqSVK1eqU6dOCgaD8vv9psoDAAAA33H4TaOqi7FKY8aMUXR0tEaNGqUGDRqooKBAN9xwg6nyAAAAgOsYG+YbNGigrl276uTJk5KkunXr6pJLLjFVHgAAAPiOZdl7M8TYMJ+dna3nnntOc+bMkSQVFBTo2WefNVUeAAAAcB1jw/zy5cv15JNPKi4uTpJ03nnn6ciRI6bKAwAAAN9xyc68saMpo6OjVavWd+UCgQDvAAsAAABnuGQONTbMp6Wl6Z133lFZWZm++uorLV++XJ07dzZVHgAAAHAdY38HMGTIENWrV0/NmzfXhx9+qI4dO+rWW281VR4AAAAIsyzL1pspxpJ5n8+njIwMZWRkmCoJAAAAuJqxYX7btm1auHCh8vPzFQgEFAqFZFmWpk+fbqoFAAAAQJJc89pNY8P8rFmzlJmZqdTUVPl85l7hCwAAALiVsWE+Pj5eHTt2NFUOAAAA+GHuCObNDfMXXnihXn75ZXXt2rXcEZWpqammWgAAAABcxdgwv3PnTknS7t27y12fMGGCqRYAAAAASezMVxlDOwAAAFC9bB/mV65cqV69emnJkiUVfr1fv352twAAAACUQzJfSSdOnJAklZaW2l0KAAAA8BTbh/krr7xSkjRw4EC7SwEAAACV4pZk3tiB73/5y1907Ngx+f1+/f73v9d//ud/auXKlabKAwAAAGGWZdl6M8XYMP/ll18qPj5e69atU1JSkqZMmaL333/fVHkAAADAdYydZhMIBCRJ69at0y9+8QslJCSYKg0AAACU544tG3PJfOfOnXX//fdr9+7dateunYqLixUdHW2qPAAAAOA6xpL52267Tf3791d8fLx8Pp9q166thx56yFR5AAAAIMwtL4A1NsyvWLGiwuuXXXaZqRYAAAAAVzE2zO/atSv8cVlZmTZt2qTzzz+fYR4AAADGkcxX0YgRI8p9fuzYMU2bNs1UeQAAAMB1jA3z3xcTE6MDBw44VR4AAAAeRjJfRRMnTgz/oYVCIe3du1fdu3c3VR4AAABwHWPD/A033BD+2OfzqWHDhkpOTjZVHgAAAAgjma+itLS08MfFxcWqW7euqdIAAABAee6Y5e0f5rdv365XX31VCQkJuvnmmzV9+nQVFxcrFArpnnvu0cUXX2x3CwAAAIAr2T7Mz5s3T4MHD9axY8f0+9//XuPHj1fr1q21b98+TZkyhWEeAAAAxrllzcZnd4FAIKCLLrpI3bt3V4MGDdS6dWtJUpMmTewuDQAAALia7cm8z/fd7wsxMTHlvuaW34gAAAAQWdwyh9o+zOfm5iozM1OhUEhlZWXKzMyU9O3xlCdPnrS7PAAAAOBatg/zb7zxht0lAAAAgCpxSzJv+848AAAAAHsYO2ceAAAAqDHcEcyTzAMAAACRimQeAAAAnuOWnXmGeQAAYESnCaucbgGGHZjzS6dbcL2IGOZ38f8IgOsVHMxzugUAgIeQzAMAAAARyi3DPC+ABQAAACIUyTwAAAA8h2QeAAAAgKNI5gEAAOA97gjmSeYBAACASEUyDwAAAM9hZx4AAACAo0jmAQAA4Dkk8wAAAAAcRTIPAAAAzyGZBwAAAOAoknkAAAB4jzuCeYZ5AAAAeA9rNgAAAAAcRTIPAAAAzyGZBwAAAOAoknkAAAB4Dsk8AAAAAEeRzAMAAMBzSOYBAAAAOIpkHgAAAN7jjmCeZB4AAACIVCTzAAAA8By37MwzzAMAAMBz3DLMs2YDAAAARCiSeQAAAHiOS4J5knkAAAAgUpHMAwAAwHPYmQcAAADgKJJ5AAAAeI5LgnmSeQAAACBSkcwDAADAc9iZBwAAAOAoknkAAAB4jkuCeZJ5AAAAIFKRzAMAAMBzfD53RPMM8wAAAPAc1mwAAAAAOIpkHgAAAJ7D0ZQAAAAAHEUyDwAAAM9xSTDPMA8AAGCXO/q01NCe58uypL+s3KM5H+10uiW4DMM8AACADdo0rqehPc/XtU//TWX+oF779S+UvfGA9hwscbo1yPmd+SVLluhvf/ubLMtSs2bNNHr0aMXExFT5cdiZBwAAsEGr8+rqi90FKi0LKBAMafX2fPXt2NjptlADFBQUaNmyZZo4caImT56sYDCoVatWndVjGRvmp02bVqlrAAAAbrBtX7G6tU5RYp0YxcVEqU/7c9U4Kd7ptvD/LMuy9XYmwWBQZWVlCgQCKisrU2Ji4lk9D2NrNnv37i33eTAY1O7du3/w+7Ozs5WdnS1JSk9PV48ePWztDwAAoDrtOHBU0z/I0RtjeupfJ/zavLdI/kDI6bZgyKmzrCRlZGQoIyNDkpSUlKTrr79eo0aNUkxMjC666CJddNFFZ1XH9mF+0aJFWrRokcrKypSZmSlJCoVCqlWrVvgJVeTUJ5yXl2d3mwAAANXutU9z9dqnuZKk8Te10/7CY842hDC7V+ZPnWW/r6SkRGvWrFFWVpbi4+P13HPPaeXKlerVq1eV69g+zN9000266aab9Oqrr2rIkCF2lwMAAKgxUurWVv7RE2qSFKe+HRur38SPnW4J/8/JF8Bu3LhRjRo1Ur169SRJXbt21fbt22vmMP9vQ4YMUUFBgQ4dOqRAIBC+npaWZqoFAAAAo+aO6q6kOjE6GQhq/KsbdOTYSadbQg2QkpKiHTt26MSJE4qJidHGjRt1wQUXnNVjGRvmX3nlFa1atUpNmzYN/yZkWRbDPAAAcK0bn/nE6RbwA5w8mbJVq1bq1q2bxo4dq6ioKLVo0eJH189/jLFh/u9//7teeOEFRUdHmyoJAAAA1EiDBg3SoEGDfvLjGBvmzznnHAUCAYZ5AAAAOM7pN42qLsaG+ZiYGD344INq3769atX6ruyIESNMtQAAAAC4irFhPj09Xenp6abKAQAAAD/IJcG8uWG+d+/eKisrU35+vho35q2MAQAAgJ/KZ6rQ2rVr9eCDD+qpp56SJOXm5mrSpEmmygMAAABhlmXZejPF2DC/cOFC/fGPf1SdOnUkSS1atNDBgwdNlQcAAABcx9iaTVRUlOLj48tdc8uriAEAABBZ3DKGGhvmmzVrpk8//VTBYFD79+/XsmXL1Lp1a1PlAQAAANcxtmYzYsQIffPNN4qOjtaUKVMUFxenYcOGmSoPAAAAhLllZ95YMl+7dm0NHjxYgwcPNlUSAAAAqBBrNlW0a9cuLVq0SIcOHVIgEAhf/9Of/mSqBQAAAMBVjA3zU6dO1e23367mzZvzwlcAAAA4yi3zqLFhvl69erwDLAAAAFCNjA3zgwYN0qxZs9SuXTtFR0eHr3ft2tVUCwAAAIAkduar7OOPP1ZeXp78fr98vu8O0WGYBwAAAM6OsWH+66+/1uTJk02VAwAAAH6QW3bmjZ0z36pVK+3du9dUOQAAAMD1jCXzOTk5WrFihRo1aqTo6GiFQiFZlsXRlAAAADDOJcG8uWH+4YcfNlUKAAAA8ARjw3zDhg0VDAZVVFSkYDBoqiwAAABwGrfszBsb5pctW6a33npL9evXD//hsWYDAAAAJzDMV9HSpUv1wgsvqG7duqZKAgAAAK5mbJhPSUlRfHy8qXIAAADAD3JJMG9umG/UqJF+97vfqVOnTuXeAbZfv36mWgAAAABcxWgyn5KSIr/fL7/fb6osAAAAcBp25qto4MCBpkoBAAAAnmD7ML9gwQINGzZMEydOrPA3oLFjx9rdAgAAAFCOS4J5+4f5Xr16SZJuuOEGu0sBAAAAnmL7MJ+amipJSktLs7sUAAAAUCnszFfR/v379eqrr2rv3r06efJk+Pr06dNNtQAAAAC4is9UoRkzZuiqq65SVFSUJkyYoF69eoVXcAAAAACTLMvemynGhvmysjK1b99eoVBIDRs21KBBg7Rp0yZT5QEAAADXMbZmExMTo2AwqPPOO08ffPCBkpKSdOTIEVPlAQAAgDCfS3bmjSXzmZmZKisr0/Dhw7V7926tXLlS99xzj6nyAAAAQJhb1myMJfOHDh1Sy5YtFRsbq9GjR0uSVq9erVatWplqAQAAAHAVY8n84sWLK3UNAAAAsJtlWbbeTLE9mV+/fr3Wr1+vgoICzZs3L3y9tLRUPp+x3yUAAAAA17F9mE9MTFRqaqrWrl0bfgMpSYqLi1NmZqbd5QEAAIDT+Nzx+lf7h/kWLVqoefPm+uqrr9S7d2+7ywEAAACeYeQFsD6fT0ePHpXf71etWsZecwsAAABUyOReu52MTdYNGzbUY489ps6dOys2NjZ8vV+/fqZaAAAAAFzF2DCfmJioxMREhUIhlZaWmioLIEJ0mrDK6RbggHVP9HC6BQAe5ZJg3twwP3DgQEnS8ePHyyXzAADAG/jlDah+xs6G3L59u8aMGaMxY8ZIknJzczV37lxT5QEAAIAwy+Z/TDE2zC9YsECPPPKI6tatK+nbU262bt1qqjwAAAAQ5rPsvRl7HuZKSSkpKeWL86ZRAAAAwFkztjOfnJysnJwcWZYlv9+vpUuXqkmTJqbKAwAAAGFuOZrSWDR+5513avny5SooKNBdd92l3Nxc3XHHHabKAwAAAK5jLJnPy8vTfffdV+7atm3b1KZNG1MtAAAAAJLcczSlsWR+/vz5lboGAAAAoHJsT+a3b9+unJwcFRcXa8mSJeHrx44dUzAYtLs8AAAAcBqfS6J524d5v9+v48ePKxAIlHvn1/j4eD3wwAN2lwcAAABcy/ZhPi0tTWlpaerdu7caNmxodzkAAADgjFwSzJt7AezJkyc1e/ZsHTp0SIFAIHx9woQJploAAAAAXMXYMP/888/ryiuvVJ8+fXizKAAAADjKLefMGxvmfT6frrrqKlPlAAAAANczFpF37txZy5cvV2FhoUpKSsI3AAAAwDTLsvdmirFkfsWKFZKk9957L3zNsixNnz7dVAsAAACAJI6mrLKsrCxTpQAAAABPsH3N5t133w1/vHr16nJfe/XVV+0uDwAAAJzGsvlmiu3D/KpVq8IfL168uNzXvvzyS7vLAwAAAK5l+5pNKBSq8OOKPgcAAABMcMvRlLYn86f+QX3/D80tf4gAAACAE2xP5nNzc5WZmalQKKSysjJlZmZK+jaVP3nypN3lAQAAgNP4XJIp2z7Mv/HGG3aXAAAAADzJ2NGUAAAAQE3hlnVvY+8ACwAAAKB6kcwDAADAc1wSzJPMAwAAAJGKZB4AAACe45adeYZ5AAAAeI5bjqZkzQYAAACIUCTzAAAA8By3rNmQzAMAAAARimQeAAAAnuOOXJ5kHgAAAIhYJPMAAADwHJ/XdubnzZunnJycctdycnK0YMGC6u4JAAAAQCVUepj/7LPPdMEFF5S7lpqaqk8//bTamwIAAADsZFn23kyp9DBvWZaCwWC5a8FgUKFQqNqbAgAAAHBmlR7m27Rpo9dffz080AeDQS1cuFBt2rSxrTkAAADADpZl2XozpdIvgB0+fLgmTpyokSNHKiUlRfn5+UpMTNTYsWPt7A8AAACodi55/Wvlh/nk5GRNmjRJO3fu1OHDh5WcnKyWLVvK5+N0SwAAAMAJVTqa0ufzqXXr1nb1AgAAABjhlqMpf3SYHzNmjJ5//nlJ0qhRo37w+2bOnFm9XQGAB9zRp6WG9jxfliX9ZeUezflop9MtAQAizI8O8yNHjgx/fO+999reDAB4RZvG9TS05/m69um/qcwf1Gu//oWyNx7QnoMlTrcGAJ5QE4L5f/3rX5o1a5a++eYbWZalUaNGVXkL5keH+VNPqklLSzu7LgEAp2l1Xl19sbtApWUBSdLq7fnq27GxspZvd7gzAIAp8+fP18UXX6zf/OY38vv9OnHiRJUfo9I7836/X2+//bY+++wzFRYWKjExUT169NCAAQMUExNzxvuvXr1aF198seLi4vT2229rz549GjBggFJTU6vcNABEum37ijXupnZKrBOj4ycD6tP+XH35daHTbQGAZ5g8PrIix44d09atW3X33XdLkmrVqqVatar0ctZv71fZb5wzZ47y8vI0fPhwNWzYUIcOHdLixYs1d+5cjR49+oz3f/vtt9W9e3dt27ZNX375pa6//nrNnTtXTz/9dIXfn52drezsbElSenq6evToUdlWAaDG23HgqKZ/kKM3xvTUv074tXlvkfwB3oQPANzi1FlWkjIyMpSRkRH+/ODBg6pXr55mzJihr7/+WqmpqRo2bJhiY2OrVKfSw/yaNWs0bdo01alTR5LUtGlTtWrVqtK79P8+wnLdunW66qqr1KVLFy1cuPAHv//UJ5yXl1fZNgEgYrz2aa5e+zRXkjT+pnbaX3jM2YYAwEPsPlz9+8P79wUCAe3Zs0cjRoxQq1atNH/+fC1evFi33nprlepU+nk0aNDgtD2esrIyJSYmVur+SUlJ+vOf/6zVq1erY8eOOnnypEIhUigA3pVSt7YkqUlSnPp2bKxFf//G4Y4AAKYkJycrOTlZrVq1kiR169ZNe/bsqfLjVDqZ79Wrl55++mldc801Sk5O1uHDh7V8+XL16tVLmzZtCn9fu3btKrz/mDFjtGHDBl1//fWqU6eOCgsLNXTo0Co3DABuMXdUdyXVidHJQFDjX92gI8dOOt0SAHiG0zvzDRo0UHJysvLy8tS4cWNt3LhRTZs2rfLjVHqY//DDDyVJixYtOu36v79mWZamT59e4f2//vprdejQQXFxcZKk2NhYxcfHV7lhAHCLG5/5xOkWAAAOGjFihKZOnSq/369GjRpV6nWo31fpYT4rK6vKD36quXPnatKkSeHPa9eufdo1AAAAwARfDThnvkWLFpo4ceJPeowqnX8TCASUk5OjgoICJScnq3Xr1oqKiqrUfUOhULm/zvD5fAoEAlXrFgAAAKgGNWGYrw6VHub37dunSZMmqaysLLwzHx0drbFjx1Zqv+ecc87R0qVLddVVV0mS/vu//1uNGjU6+84BAAAAj6v0MD937lxlZGTo+uuvDyfs7733nl588UVNmDDhjPe/8847NX/+fL3zzjuyLEvt2rXTyJEjz75zAAAA4Cw5/QLY6lLpYT43N1ePPfZYuSd+3XXXnfaC2B9Sv3593X///VXvEAAAAECFKj3MJyUlacuWLeWOnty6desZz5l/99131b9/f82bN6/Cr48YMaKyLQAAAADVwnM784MHD9akSZPUuXNnpaSkKD8/X+vWrTvjO8A2adJEkpSamvrTOgUAAABQTqWH+by8PD3zzDNatWqVCgsL1axZMw0aNEjr1q370fulp6dL+vYoyu7du5f72urVq8+iZQAAAOCnccnKvHyV/ca3335b5513nm6++Wbdcccduvnmm9W4cWO9/fbblbr/4sWLK3UNAAAAQOWcMZnftGmTJCkYDIY//rd//vOf4Xd0/SHr16/X+vXrVVBQUG5vvrS0VD5fpX+XAAAAAKqNzyXR/BmH+ZkzZ0qSysrKwh9L3x7nU79+/TO+gDUxMVGpqalau3Ztub35uLg4ZWZmnm3fAAAAgOedcZjPysqSJE2fPl333HNPlQu0aNFCLVq0UM+ePSv9brEAAACAndyyH1LpF8CezSAvSc8995weeOABPfTQQxUezv+nP/3prB4XAAAAOFsu2bKp/DB/toYPHy5JGjdunN2lAAAAAE+xfZj/95tKNWzY0O5SAAAAQKV45gWwP9V//Md/lFuvCYVCsiwr/H9feuklu1sAAAAAXMn2Yf6//uu/7C4BAAAAVIlLgnn7h/lT5ebmatu2bZKktm3b6mc/+5nJ8gAAAICrGDuVZ+nSpZo2bZqOHDmiI0eOaOrUqVq2bJmp8gAAAECYz7L3ZoqxZP5vf/ubnnrqKcXGxkqS+vfvr0cffVTXXnutqRYAAAAAVzE2zIdCIfl83/1FgM/nUygUMlUeAAAACOM0myq6/PLL9cgjj6hLly6SpDVr1uiKK64wVR4AAABwHWPDfL9+/ZSWlhZ+Aezo0aN1/vnnmyoPAAAAhLkkmLd/mC8rK9OHH36oAwcOqHnz5rr66qsVFRVld1kAAADA9Wwf5rOyshQVFaW2bdtq/fr12rdvn4YNG2Z3WQAAAOAHmTxxxk62D/N79+7V5MmTJUlXXHGFHn74YbtLAgAAAD/KkjumedvPma9V67vfF1ivAQAAAKqP7cl8bm6uMjMzJX17PGVZWZkyMzMVCoVkWZZeeuklu1sAAAAAymHNppLeeOMNu0sAAAAAnmTsaEoAAACgpnBLMm/7zjwAAAAAe5DMAwAAwHMsl7xrFMk8AAAAEKFI5gEAAOA57MwDAAAAcBTJPAAAADzHJSvzDPMAAADwHp9LpnnWbAAAAIAIRTIPAAAAz+EFsAAAAAAcRTIPAAAAz3HJyjzJPAAAABCpSOYB1Ai5M3/pdAswLLHLPbrguk+cbgOG7frrw063AEiSfHJHNE8yDwAAAEQoknkAAAB4DjvzAAAAABxFMg8AAADP4Zx5AAAAAI4imQcAAIDn+FyyNM8wDwAAAM9xySzPmg0AAAAQqUjmAQAA4DluWbMhmQcAAAAiFMk8AAAAPMclwTzJPAAAABCpSOYBAADgOW5JtN3yPAAAAADPIZkHAACA51guWZonmQcAAAAiFMk8AAAAPMcduTzDPAAAADyIN40CAAAA4CiSeQAAAHiOO3J5knkAAAAgYpHMAwAAwHNcsjJPMg8AAABEKpJ5AAAAeA5vGgUAAADAUSTzAAAA8By3JNpueR4AAACA55DMAwAAwHPYmQcAAADgKJJ5AAAAeI47cnmGeQAAAHgQazYAAAAAHEUyDwAAAM9xS6LtlucBAAAAeA7JPAAAADyHnXkAAAAAjiKZBwAAgOe4I5cnmQcAAAAiFsk8AAAAPMclK/Mk8wAAAECkIpkHAACA5/hcsjVPMg8AAABEKJJ5AAAAeA478wCAn+TxR8erd8/uGtC/n9OtwCazJtymrz/6o9YufDh8bUBGR33x1iP61xdT1SmtuYPdAd5m2fyPKQzzAOCQ/jcO0MzZc51uAzZ6+f3/Vf+AWPynAAAUz0lEQVS7s8pd27wrT7f+Zo4+XbfLoa4A1BTBYFAPPfSQJk6ceNaPYWyY37ZtW6WuAYBXdE7vonr16zvdBmz02bpdKjhyrNy1nD3/1I6vDzrUEYB/syx7b5WxdOlSNWnS5Cc9D2PD/Pz58yt1DQAAAHC7w4cPa926derTp89PehzbXwC7fft25eTkqLi4WEuWLAlfP3bsmILB4A/eLzs7W9nZ2ZKk9PR09ejRw+5WAQAA4BF2H0156iwrSRkZGcrIyAh/vmDBAg0dOlSlpaU/qY7tw7zf79fx48cVCATKNRsfH68HHnjgB+936hPOy8uzu00AAACg2nx/eD/VF198ofr16ys1NVWbN2/+SXVsH+bT0tLUpk0b/eMf/9DAgQPtLgcAAACckZNHU+bk5Gjt2rVav369ysrKVFpaqqlTp+q+++6r8mMZOWfe5/OppKTERCkAiBhjf/uA1q75u4qKCnXlFb006u57NeBmQg83eemPw9SzcyulNEjQzg+e1JOzlqrwyL/03NiBSklM0DtT79JXOft0w/dOvAHgbkOGDNGQIUMkSZs3b9b7779/VoO8ZPBNo84//3xNmjRJ3bt3V+3atcPXu3btaqoFAKhRJv3pOadbgM0yxy+o8Pp7H39lthEAp3HLm0YZG+ZLSkpUt25dbdq0qdx1hnkAAAB41YUXXqgLL7zwrO9vbJgfPXq0qVIAAADAjzL5Lq12MjbMHz58WPPmzVNOTo4sy9LPf/5zDR8+XMnJyaZaAAAAACRJPnfM8ubeNGrGjBlKT0/X7NmzNWvWLKWnp2vGjBmmygMAAACuY2yYLy4u1uWXX66oqChFRUWpd+/eKi4uNlUeAAAACLNs/scUY8N8vXr1tHLlSgWDQQWDQa1cuVJ169Y1VR4AAABwHWM786NGjdKLL76ol156SZL085//XKNGjTJVHgAAAAjjaMoqSklJ0dixY02VAwAAAFzP2DD/z3/+U/Pnz9eOHTtkWZZat26tzMxMnXPOOaZaAAAAACS552hKYzvzU6dOVY8ePfTnP/9Zs2fPVrdu3TRlyhRT5QEAAADXMTbMh0Ih9erVK3yaTa9evWS5ZVkJAAAAEcVn2XszxdiazYUXXqjFixerR48esixLq1atUseOHVVSUiJJSkhIMNUKAAAA4ArGhvlVq1ZJkrKzsyV9m9RL0scffyzLsjR9+nRTrQAAAMDj3LIzb/swv3PnTqWkpCgrK0uS9Mknn+jzzz9Xw4YNNWjQIBJ5AAAA4CzZvjM/Z84c1ar17e8MW7Zs0WuvvabLLrtM8fHxmj17tt3lAQAAgNNYlr03U2wf5oPBYDh9X7Vqlfr06aNu3brp1ltv1YEDB+wuDwAAAJzGsvlmipFhPhAISJI2bdqkdu3alfsaAAAAgLNj+878pZdeqt/97neqW7euYmJi1LZtW0nSgQMHFB8fb3d5AAAA4DQ+lxyRbvswP2DAALVr105FRUXq0KFD+Gz5YDCo4cOH210eAAAAcC0jR1O2bt36tGuNGzc2URoAAAA4jTtyeYPvAAsAAACgehl70ygAAACgxnBJNE8yDwAAAEQoknkAAAB4juWSaJ5kHgAAAIhQJPMAAADwHJccM88wDwAAAO9xySzPmg0AAAAQqUjmAQAA4D0uieZJ5gEAAIAIRTIPAAAAz+FoSgAAAACOIpkHAACA57jlaEqSeQAAACBCkcwDAADAc1wSzJPMAwAAAJGKZB4AAADe45JonmQeAAAAiFAk8wAAAPAct5wzzzAPAAAAz+FoSgAAAACOIpkHAACA57gkmGeYR8107eytTrcA47bq88f6ON0EDCpcM93pFuCAgoN5TrcAuArDPIAag//Ie0tSo8ZOtwDAy1wSzbMzDwAAAEQoknkAAAB4jluOpiSZBwAAACIUyTwAAAA8h3PmAQAAADiKZB4AAACe45JgnmEeAAAAHuSSaZ41GwAAACBCkcwDAADAcziaEgAAAICjSOYBAADgORxNCQAAAMBRJPMAAADwHJcE8yTzAAAAQKQimQcAAID3uCSaJ5kHAAAAIhTJPAAAADyHc+YBAAAAOIpkHgAAAJ7jlnPmGeYBAADgOS6Z5VmzAQAAACIVyTwAAAC8xyXRPMk8AAAAEKFI5gEAAOA5HE0JAAAAwFEk8wAAAPActxxNSTIPAAAARCiSeQAAAHiOS4J5knkAAAAgUpHMAwAAwHtcEs0zzAMAAMBzOJoSAAAAgKNI5gEAAOA5HE0JAAAAwFEk8wAAAPAclwTzJPMAAABApCKZBwAAgPe4JJonmQcAAAAiFMk8AAAAPIdz5gEAAAA4imQeAAAAnuOWc+YZ5oEaYmj3ZhrQqbFCIWnHwRI9vniryvxBp9sCUI0ef3S8Vq74RElJyXrn3SVOtwPABRjmgRqgUd3aGtK1mW6a/r864Q/qmYHtdE27c/Tehv1OtwagGvW/cYAGDxmqR8aPdboVwPOcDubz8/OVlZWloqIiWZaljIwM9e3bt8qPY2yYD4VCKi4uViAQCF9LSkoyVR6o8aJ8lmpH++QPhhQXHaVDR0843RKAatY5vYv27dvrdBsA5PyaTVRUlG6//XalpqaqtLRU48aNU4cOHdS0adMqPY6RYX758uV68803lZCQIJ/vu9fcPv/88ybKAzXewaMn9NKqf2j5mEt13B/U6l0FWr2rwOm2AACATRITE5WYmChJiouLU5MmTVRQUFAzh/klS5bo+eefV7169UyUAyJO3dhauvznKer7wiodPe7Xs4Pa67oO5+qvXx1wujUAAFzK6UWb7xw8eFB79uxRy5Ytq3xfI8N8cnKyEhISqnSf7OxsZWdnS5LS09PVo0cPO1oDaoRuqUnaV3RchcdOSpI+2npQFzWrzzAPAECEOnWWlaSMjAxlZGSc9n3Hjx/X5MmTNWzYMMXHx1e5jq3D/NKlSyVJ5557rp544gl17txZtWp9V/LHlvxPfcJ5eXl2tgk47sCR4+rQtJ5io306fjKorqlJ2pJX7HRbAAC4lt078z80vJ/K7/dr8uTJ6tmzp7p27XpWdWwd5ouLvx1GGjRooAYNGujYsWN2lgMi1sZ9xfpwy0G9PvISBYIhbTtwVG+t3ed0WwCq2djfPqC1a/6uoqJCXXlFL426+14NuHmg020BcEAoFNKsWbPUpEkT9evX76wfxwqFQqFq7MsWJPPec+3srU63AAcsG9nW6RZgUFKjxk63AAcUHOS/6V7SuHHN/fc8r6jM1sdv3CDmR7++bds2Pf7442revLms//9rgsGDB6tTp05VqmNkZ/6pp57S/fffrzp16kiSSkpKNG3aNI0fP95EeQAAAKBGadOmjd58882f/DhGhvmioqLwIC9JCQkJKiwsNFEaAAAAOI3T58xXF9+Zv6Uaivh8Onz4cPjz/Px8E2UBAAAAVzOSzN9yyy167LHH1K5dO0nS5s2bdccdd5goDQAAAJzGqkHnzP8URob5Tp066emnn9b27dslSbfddpvq169vojQAAADgWkbWbKRv0/i9e/fqkksu0cmTJ7V7925TpQEAAIDyLJtvhhgZ5l988UVt3rxZ//M//yNJio2N1Zw5c0yUBgAAAE7jklnezDC/fft2/epXv1J0dLSkb0+z8fv9JkoDAAAArmVkZz4qKkrBYDB8IP7Ro0fDHwMAAACmuWUUtXWYDwQCioqK0tVXX63JkyeruLhYb775plavXq1f/vKXdpYGAAAAXM/WYf7hhx/WpEmTdNlllyk1NVUbN25UKBTSmDFj1Lx5cztLAwAAAD+IoykrIRQKhT9u1qyZmjVrZmc5AAAAwFNsHeaLi4u1ZMmSH/x6v3797CwPAAAAVMwdwby9w3wwGNTx48fLJfQAAAAAqoetw3xiYiIvdAUAAECN45Jg3t5z5knkAQAAAPvYmsw//vjjdj48AAAAcFY4Z74SEhIS7Hx4AAAA4Ky45WhKW9dsAAAAANjH1mQeAAAAqIncsmZDMg8AAABEKIZ5AAAAIEIxzAMAAAARip15AAAAeA478wAAAAAcRTIPAAAAz+GceQAAAACOIpkHAACA57AzDwAAAMBRJPMAAADwHJcE8wzzAAAA8CCXTPOs2QAAAAARimQeAAAAnsPRlAAAAAAcRTIPAAAAz+FoSgAAAACOIpkHAACA57gkmCeZBwAAACIVyTwAAAC8xyXRPMk8AAAAEKFI5gEAAOA5bjlnnmEeAAAAnsPRlAAAAAAcZYVCoZDTTeCHZWdnKyMjw+k2YBA/c+/hZ+49/My9h5857EIyX8NlZ2c73QIM42fuPfzMvYefuffwM4ddGOYBAACACMUwDwAAAEQohvkajv067+Fn7j38zL2Hn7n38DOHXXgBLAAAABChSOYBAACACMUwDwAAAEQo3gG2htqwYYPmz5+vYDCoPn366MYbb3S6JdhsxowZWrdunerXr6/Jkyc73Q5slp+fr6ysLBUVFcmyLGVkZKhv375OtwUblZWVacKECfL7/QoEAurWrZsGDRrkdFswIBgMaty4cUpKStK4ceOcbgcuwzBfAwWDQb344ot69NFHlZycrPHjxys9PV1NmzZ1ujXYqHfv3rrmmmuUlZXldCswICoqSrfffrtSU1NVWlqqcePGqUOHDvx77mLR0dGaMGGCYmNj5ff79fjjj+viiy9W69atnW4NNlu6dKmaNGmi0tJSp1uBC7FmUwPt3LlT5557rs455xzVqlVLPXr00Jo1a5xuCzZLS0tTQkKC023AkMTERKWmpkqS4uLi1KRJExUUFDjcFexkWZZiY2MlSYFAQIFAQJZlOdwV7Hb48GGtW7dOffr0cboVuBTJfA1UUFCg5OTk8OfJycnasWOHgx0BsNPBgwe1Z88etWzZ0ulWYLNgMKixY8fqwIEDuvrqq9WqVSunW4LNFixYoKFDh5LKwzYk8zVQRaeFkt4A7nT8+HFNnjxZw4YNU3x8vNPtwGY+n0/PPvusZs2apV27dukf//iH0y3BRl988YXq168f/ls4wA4k8zVQcnKyDh8+HP788OHDSkxMdLAjAHbw+/2aPHmyevbsqa5duzrdDgyqU6eO0tLStGHDBjVv3tzpdmCTnJwcrV27VuvXr1dZWZlKS0s1depU3XfffU63BhdhmK+BLrjgAu3fv18HDx5UUlKSVq1axb/4gMuEQiHNmjVLTZo0Ub9+/ZxuBwYUFxcrKipKderUUVlZmTZu3Kj+/fs73RZsNGTIEA0ZMkSStHnzZr3//vv89xzVjmG+BoqKitKIESP01FNPKRgM6vLLL1ezZs2cbgs2e+GFF7RlyxYdPXpUd911lwYNGqQrrrjC6bZgk5ycHK1cuVLNmzfXgw8+KEkaPHiwOnXq5HBnsEthYaGysrIUDAYVCoXUvXt3de7c2em2AEQ4K1TRgjYAAACAGo8XwAIAAAARimEeAAAAiFAM8wAAAECEYpgHAAAAIhTDPAAAABChGOYBwKCsrCy9/vrrkqStW7fq17/+tZG6gwYN0oEDB4zUAgCYwzAPAA5p27atpkyZcsbv++STT/TYY48Z6AgAEGkY5gHgLAUCAadbAAB4HO8ACwDfc/fddysjI0MrV65UUVGRunTpojvuuEM7duzQtGnTdM011+ivf/2rOnTooHvvvVdffPGFXn/9dR06dEhNmzbVnXfeqZ/97GeSpD179mjWrFnav3+/OnbsKMuywnU2b96sadOmadasWZKk/Px8LViwQFu3blUoFNKll16qq6++WnPmzJHf79ftt9+uqKgoLViwQCdPntRrr72m1atXy+/3q0uXLho2bJhiYmIkSe+9956WLFkiy7J0yy23mP9DBAAYQTIPABX49NNP9cgjj2jatGnav3+/3nnnHUlSUVGRSkpKNGPGDI0cOVK7d+/WzJkz9atf/Urz5s1TRkaGnnnmGZ08eVJ+v1/PPvusevbsqXnz5ql79+76/PPPK6wXDAY1adIkpaSkKCsrS7NmzdKll14a/uWgdevWevnll7VgwQJJ0iuvvKL9+/fr2Wef1dSpU1VQUKC33npLkrRhwwa9//77evTRRzVlyhRt3LjRyJ8ZAMA8hnkAqMDVV1+tlJQUJSQk6KabbtJnn30mSbIsS4MGDVJ0dLRiYmL00UcfKSMjQ61atZLP51Pv3r1Vq1Yt7dixQ9u3b1cgENB1112nWrVqqVu3brrgggsqrLdz504VFBTo9ttvV2xsrGJiYtSmTZsKvzcUCumjjz5SZmamEhISFBcXpwEDBoR7XLVqlXr37q3mzZsrNjZWAwcOtOcPCQDgONZsAKACKSkp4Y8bNmyogoICSVK9evXCqyzSt6sxK1as0AcffBC+5vf7VVBQIMuylJSUVG615tTHPVV+fr4aNmyoqKioM/ZWXFysEydOaNy4ceFroVBIwWBQklRYWKjU1NRy/QMA3IlhHgAqkJ+fX+7jpKQkSSo3mEtScnKyBgwYoAEDBpz2GFu2bFFBQYFCoVD4focPH9a555572vempKQoPz9fgUDgjAN93bp1FRMTo+eeey7c16kSExN1+PDhCp8LAMBdWLMBgAosX75chw8fVklJiRYtWqTu3btX+H19+vTRhx9+qB07digUCun48eNat26dSktL1bp1a/l8Pi1btkyBQECff/65du7cWeHjtGzZUomJiXrllVd0/PhxlZWVadu2bZKkBg0aqKCgQH6/X5Lk8/nUp08fLViwQEeOHJEkFRQUaMOGDZKk7t2765NPPtHevXt14sQJLVy4sLr/eAAANQTJPABU4Be/+IX+8Ic/qLCwUOnp6br55psrHMQvuOACjRw5UvPmzdP+/fvDu+5t27ZVrVq19Nvf/lazZ8/W66+/ro4dO+qSSy6psJ7P59PYsWM1b948jR49WpZl6dJLL1WbNm3Url278AthfT6fXnzxRd12221666239Mgjj+jo0aNKSkrSlVdeqYsvvlgdO3bUddddpyeeeEI+n0+33HKLPv30U7v/yAAADrBCoVDI6SYAoCa5++67NXLkSHXo0MHpVgAA+FGs2QAAAAARimEeAAAAiFCs2QAAAAARimQeAAAAiFAM8wAAAECEYpgHAAAAIhTDPAAAABChGOYBAACACPV/0GU+Uwzx6bIAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 1008x626.4 with 2 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "predictions = test_docs.topic.to_frame('topic').assign(predicted=topic_probabilities.idxmax(axis=1).values)\n",
    "heatmap_data = predictions.groupby('topic').predicted.value_counts().unstack()\n",
    "sns.heatmap(heatmap_data, annot=True, cmap='Blues');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Resources\n",
    "\n",
    "- pyLDAvis: \n",
    "    - [Talk by the Author](https://speakerdeck.com/bmabey/visualizing-topic-models) and [Paper by (original) Author](http://www.aclweb.org/anthology/W14-3110)\n",
    "    - [Documentation](http://pyldavis.readthedocs.io/en/latest/index.html)\n",
    "- LDA:\n",
    "    - [David Blei Homepage @ Columbia](http://www.cs.columbia.edu/~blei/)\n",
    "    - [Introductory Paper](http://www.cs.columbia.edu/~blei/papers/Blei2012.pdf) and [more technical review paper](http://www.cs.columbia.edu/~blei/papers/BleiLafferty2009.pdf)\n",
    "    - [Blei Lab @ GitHub](https://github.com/Blei-Lab)\n",
    "    \n",
    "- Topic Coherence:\n",
    "    - [Exploring Topic Coherence over many models and many topics](https://www.aclweb.org/anthology/D/D12/D12-1087.pdf)\n",
    "    - [Paper on various Methods](http://www.aclweb.org/anthology/N10-1012)\n",
    "    - [Blog Post - Overview](http://qpleple.com/topic-coherence-to-evaluate-topic-models/)\n"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  },
  "name": "_merged",
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": true,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "203.153px",
    "left": "69.9915px",
    "right": "1064px",
    "top": "66.3352px",
    "width": "302px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
